Rank | Count | Beginning |
---|---|---|
7 | 1628 | Hann |
60 | 1278 | Í |
8 | 614 | Það |
71 | 610 | Hún |
77 | 431 | Þar |
93 | 413 | Þegar |
17 | 394 | Eftir |
13 | 284 | Þetta |
380 | 282 | Þeir |
10 | 254 | Þá |
1 | 244 | En |
3 | 221 | Um |
112 | 196 | Einnig |
366 | 184 | Þessi |
315 | 181 | Þau |
9 | 174 | Við |
48 | 166 | Til |
246 | 155 | Með |
259 | 136 | Frá |
443 | 135 | Að |
364 | 123 | Ef |
78 | 107 | “ |
16 | 104 | Ekki |
552 | 100 | Saga |
387 | 97 | Þær |
453 | 97 | Fyrir |
66 | 86 | Auk |
91 | 85 | Dáin |
43 | 84 | Síðan |
37 | 82 | Þannig |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV